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Abstract 

A  new  dynamical  core  for  numerical  weather  prediction  (NWP)  based  on  the  spec¬ 
tral  element  method  is  presented.  This  paper  represents  a  departure  from  previously 
published  work  on  solving  the  atmospheric  primitive  equations  in  that  the  horizontal 
operators  are  all  written,  discretized,  and  solved  in  3D  Cartesian  space.  The  advan¬ 
tages  of  using  Cartesian  space  are:  the  pole  singularity  which  plagues  the  equations  in 
spherical  coordinates  disappears;  any  grid  can  be  used  including  lat-lon,  icosahedral, 
hexahedral  and  adaptive  unstructured  grids;  and  the  conversion  to  a  semi-Lagrangian 
formulation  is  easily  achieved.  The  main  advantage  of  using  the  spectral  element 
method  is  that  the  horizontal  operators  can  be  approximated  by  local  high-order  ele¬ 
ments  while  scaling  efficiently  on  distributed-memory  computers.  In  order  to  validate 
the  3D  global  atmospheric  spectral  element  model,  we  present  results  for  seven  test 
cases:  three  barotropic  tests  which  confirm  the  exponential  accuracy  of  the  horizontal 
operators  and  four  baroclinic  test  cases  which  validate  the  full  3D  primitive  hydro¬ 
static  equations.  These  four  baroclinic  test  cases  are:  the  Rossby-Haurwitz  wave 
number  4,  the  Held-Suarez  test,  and  the  Jablonowski- Williamson  balanced  initial 
state  and  baroclinic  instability  tests.  Comparisons  with  four  operational  NWP  and 
climate  models  demonstrate  that  the  spectral  element  model  is  at  least  as  accurate  as 
spectral  transform  models  while  scaling  linearly  on  distributed-memory  computers. 


1  Introduction 


Because  of  the  changing  trends  in  high  performance  computers  from  large  vector  ma¬ 
chines  to  distributed-memory  architectures,  numerical  methods  that  decompose  the 
physical  domain  into  smaller  pieces  have  been  receiving  significant  attention.  This 
new  focus  on  local  methods  is  especially  true  in  the  atmospheric  sciences  where  very 
large  models  covering  the  entire  globe  are  run  in  time-scales  ranging  from  days  (in 
numerical  weather  prediction)  to  thousands  of  years  (in  climate  simulations).  Fi¬ 
nite  difference  and  finite  element  methods  are  two  such  methods  which  decompose 
the  domain  locally  thereby  facilitating  their  implementation  on  distributed-memory 
computers.  However,  one  of  the  biggest  disadvantages  of  these  methods  is  that  tra¬ 
ditionally  they  have  not  been  able  to  compete,  in  terms  of  accuracy,  with  spectral 
transform  methods  which  are  typically  used  operationally  in  numerical  weather  pre¬ 
diction  (NWP)  and  climate  modeling.  For  example,  spectral  transform  models  are 
used  by  the  National  Center  for  Environmental  Prediction  (Sela  1980),  the  European 
Centre  for  Medium- Range  Forecasts  (Simmons  et  al.  1989),  the  National  Center  for 
Atmospheric  Research  (Hack  et  al.  1992),  and  the  U.S.  Navy  (Hogan  and  Rosmond 
1991). 

Spectral  element  methods  combine  the  local  domain  decomposition  property  of 
finite  element  methods  with  the  high-order  accuracy  of  spectral  transform  methods. 
In  other  words,  spectral  elements  are  as  local  as  finite  element  methods,  and  thereby 
can  be  used  as  efhciently  on  distributed-memory  computers  while  sustaining  the  same 
level  of  accuracy  obtained  with  spectral  transform  methods.  Spectral  element  meth¬ 
ods  have  been  used  successfully  for  the  shallow  water  equations  on  the  sphere  (Gi- 
raldo  2001;  Giraldo  et  al.  2002;  Taylor  et  al.  1997)  and  have  shown  to  be  promising 
for  ocean  and  climate  modeling  (Iskandarani  et  al.  2002;  Loft  et  al.  2001;  Thomas 
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et  al.  2002).  These  methods  are  essentially  high-order  finite  element  methods  where 
the  grid  points  are  chosen  to  be  the  Legendre-Gauss-Lobatto  (LGL)  points.  This 
choice  of  grid  points  allows  for  stable  high-order  interpolations  and  results  in  efficient 
numerical  integration  strategies  because  the  LGL  points  are  also  used  as  the  quadra¬ 
ture  points  in  the  numerical  integration  required  by  the  weak  integral  formulation 
common  to  all  Galerkin  methods. 

In  this  paper  we  extend  the  3D  Cartesian  spectral  element  method  for  the  spherical 
shallow  water  equations  introduced  in  Giraldo  (2001)  to  the  full  3D  primitive  hydro¬ 
static  equations  governing  the  motion  of  the  atmosphere.  This  method  represents  a 
radical  departure  from  all  previous  numerical  methods  for  flow  on  spherical  geometry 
in  that  the  horizontal  operators  are  written,  discretized,  and  solved  completely  in  3D 
Cartesian  space.  By  doing  so,  we  avoid  the  pole  singularity  problem  associated  with 
the  governing  equations  in  spherical  coordinates.  For  a  spherical  shell,  described  by 
the  coordinates  (A,  <^),  of  radius  a  the  divergence  of  a  vector  field,  F  —  fX  +  g(p^  is 
given  as 

^  1  \df  dgcoscp 

acosip  [9A  dp 

At  the  poles,  i.e.,  p  —  ±7r/2,  this  is  a  source  of  numerical  problems,  caused  by  the 
specific  coordinate  formulation  rather  than  the  nature  of  the  primitive  equations  and 
its  solutions.  While  the  use  of  a  local  Cartesian  coordinate  system  has  been  used 
to  overcome  these  problems  in  the  past  (Taylor  et  al.  1997)  we  have,  guided  by  the 
results  of  previous  work  (Giraldo  2001;  Giraldo  et  al.  2002),  chosen  to  maintain  the 
Cartesian  formulation  everywhere. 

Therefore,  in  our  formulation  the  poles  are  treated  as  any  other  point  in  Cartesian 
space.  Because  the  numerical  method  is  constructed  independently  of  the  grid,  this 
then  implies  that  any  grid  can  be  used  within  this  framework  including:  icosahedral, 
hexahedral,  lat-lon,  and  adaptive  unstructured  grids.  The  option  of  using  adaptive 
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unstructured  grids  will  facilitate  the  coupling  of  this  dynamical  core  with  the  Naval 
Research  Laboratory’s  (NRL)  mesoscale  model  (Hodur  1997).  The  independence  of 
our  numerical  methodology  from  the  grid  also  means  that  we  can  change  the  basis 
functions  from  continuous  to  discontinuous  as  we  showed  in  Giraldo  et  al.  (2002),  or 
the  elements  on  which  these  functions  are  constructed  from  quadrilaterals  to  triangles 
(Warburton  et  al.  2000)  which  then  simplihes  the  construction  of  adaptive  solutions. 
This  independence  from  the  grid  is  not  shared  by  any  of  the  existing  and  newly 
proposed  global  atmospheric  models  including  the  spectral  element  model  in  Taylor 
et  al.  (1997),  Loft  et  al.  (2001)  and  Thomas  et  al.  (2002),  and  the  icosahedral  model 
in  Randall  et  al.  (2002).  In  fact,  the  formulations  of  all  these  models  are  bounded 
on  a  specihc  class  of  grids.  Furthermore,  the  Cartesian  formulation  simplihes  the 
addition  of  semi-Lagrangian  schemes.  In  this  paper,  we  refer  to  our  current  model  as 
Eulerian  in  order  to  distinguish  it  from  the  semi-Lagrangian  version  we  are  currently 
testing  in  other  work.  In  brief,  the  objective  of  this  paper  is  to  show  the  feasibility 
of  the  Cartesian  spectral  element  formulation  for  constructing  hydrostatic  primitive 
equation  models  that  are  as  accurate  as  current  spectral  transform  models  and  more 
efficient  on  distributed-memory  computers. 

The  remainder  of  the  paper  is  organized  as  follows.  Section  2  contains  a  description 
of  the  governing  equations  of  motion  used  in  numerical  weather  prediction  models, 
along  with  a  detailed  dehnition  of  the  prognostic  and  diagnostic  variables  used  in 
our  model.  Section  3  contains  the  description  of  the  numerical  approximation  of  the 
equations  including:  the  horizontal,  vertical,  and  temporal  discretization  methods.  In 
Sec.  4  we  describe  the  tessellation  of  the  sphere  into  the  quadrilateral  elements  used  by 
the  spectral  element  method  to  construct  the  local  element  matrix  operations.  This 
leads  directly  into  Sec.  5  which  contains  a  discussion  on  the  domain  decomposition  of 
the  sphere  and  how  it  translates  into  the  implementation  of  the  model  on  distributed- 
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memory  computers  using  Message-Passing  Interface.  In  Sec.  6  we  present  the  results 
for  the  seven  test  cases  used  to  validate  our  model.  Finally,  in  Sec.  7  we  summarize 
the  key  hndings  of  this  research  and  discuss  the  direction  of  future  work. 


2  Atmospheric  Equations 


The  dynamics  of  a  hydrostatic  atmosphere  (i.e.,  dynamical  core)  are  governed  by 

^  +  V-(7r'u)  +  ^(7rd)  =  0,  (1) 

du  .  du  2ujz  n^P „ 

—  +  u-  Vu  +  a—  = - —{x  X  w)  -  -  c„0— Vtt  -  /r®,  (2) 

O'K 


dt 


da 


and 


-Ml  •  V6> -Fa—  =  0, 

ot  do 


dp- 


(3) 


(4) 


where  the  prognostic  variables  are  the  surface  pressure,  tt,  the  three  Cartesian  velocity 
components,  u  =  {u,  v,  w),  and  the  potential  temperature,  9.  The  diagnostic  variables 
are  the  vertical  velocity  d,  pressure  p,  and  geopotential  height  (p. 

In  Eq.  (2),  a  and  w  are  the  earth’s  radius  and  angular  velocity,  and  /r  is  a  Lagrange 
multiplier  used  to  constrain  the  fluid  particles  to  remain  on  each  spherical  shell  dehned 
by  the  vertical  coordinate  a;  we  shall  describe  the  role  of  the  Lagrange  multiplier  in 
detail  in  Section  3.4.1.  The  independent  variables  in  this  coordinate  system  are 
{x,y,  z,  a,t)  where  the  triple  {x,y,z)  represents  the  grid  point  on  the  sphere  dehned 
by  the  spherical  coordinates  (A,  (p)  and  are  related  by 


X  =  a  cos  A  cos 
y  =  a  sin  A  cos  (p 
z  =  asin<^. 
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Thus  in  Eqs.  (1),  (2),  and  (3)  V  is  defined  as 


at  constant  a. 

The  surface  pressure  variable  ,  tt,  in  the  governing  equations  is  defined  as 


3  The  Numerical  Scheme 

To  solve  Eqs.  (1),  (2),  and  (3)  we  split  the  spatial  operators  into  their  horizontal 
and  vertical  components.  Therefore  for  a  given  a  value,  we  discretize  the  horizontal 


6 


operators  defined  on  a  constant  o  spherical  shell  as  was  done  in  Giraldo  (2001)  using 
the  spectral  element  method.  The  vertical  operators  are  discretized  by  a  mass  and 
energy  conserving  flux-form  finite  difference  method.  We  begin  with  the  horizontal 
discretization  of  the  equations  by  the  spectral  element  method. 


3.1  Approximating  the  Solution  in  the  Horizontal  Direction 

3.1.1  Basis  Functions  and  Integration 


To  define  the  local  operators  which  shall  be  used  to  construct  the  global  approx¬ 
imation  of  the  solution  we  begin  by  decomposing  the  spherical  domain  into  iVe 
non-overlapping  quadrilateral  elements  such  that 

Ne 

=  U  fie- 

e=l 


To  perform  differentiation  and  integration  operations,  we  introduce  the  nonsin¬ 
gular  mapping  a?  =  T(^)  which  defines  a  transformation  from  the  physical  Cartesian 
coordinate  system  x  —  {x,  y,  z)  defined  in  Qg  to  the  reference  coordinate  system 
^  defined  in  each  element  where  (C^)  ^  iii  ^ach  element,  and 

C  =  1  on  the  surface  of  the  sphere. 

Associated  with  the  local  mapping,  T,  is  the  transformation  Jacobian,  J  — 
and  the  determinant 


01-^. G  ,  G-^X—  , 

where  G  represents  the  surface  conforming  component  of  the  mapping  (see  Giraldo 
2001  for  further  details). 

We  can  now  use  this  mapping  to  define  the  local  representation  of  the  solution,  q  — 
(tt,  u,  0),  and  the  approximation  of  operations  such  as  differentiation  and  integration. 
For  simplicity,  we  assume  C  to  be  unity  in  what  remains  and  denote  ^  =  (G^)- 
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The  simple  structure  of  the  reference  element,  I,  spanned  by  ^  G  [—1, 1]^,  makes 
it  natural  to  represent  the  local  element-wise  solution  q  by  an  iVth  order  polynomial 
in  ^  as 

{N+lf 


Qn{x)  =  ^k{x)qN{xk) 


(5) 


k=l 


where  Xk  represents  {N  -|-  1)^  grid  points  and  is  the  associated  multivariate 

Lagrange  polynomial.  The  logical  square  structure  of  I  simplifies  matters  in  that  we 
can  express  the  Lagrange  polynomial  by  a  tensor-product  as 


^k{x)  =  hi{i{x))  hj{ri{x)),  (6) 

where  i,j  =  0,  and  k  =  1, {N  -|-  1)^.  In  Eq.  (6)  h  are  the  one-dimensional 
Lagrange  polynomials 

N{N  + 1)  K  -  • 

where  Pn{0  i®  order  Legendre  polynomial.  For  the  grid  points  (6,%)  we 

choose  the  Legendre-Gauss-Lobatto  (LGL)  points,  given  as  the  tensor-product  of  the 
roots  of 

(i-aj^,K)=o  ■ 

This  choice  simplifies  the  construction  of  the  algorithm  because  the  LGL  points  are 
also  used  as  the  sampling  points  in  the  Gaussian  quadrature  rule  required  by  the 
numerical  integration  which  we  shall  describe  shortly. 

The  choice  of  the  LGL  points  enables  the  straightforward  approximation  of  local 
element  integrals,  i.e., 

q{x)dx  =  Y  ^ {vj)q{^i te ,Vj)\  , 

where  \J\  represents  the  local  Jacobian  for  the  transformation  between  fie  and  I,  and 
and  co{r]j)  are  the  Gaussian  quadrature  weights, 

=  iv(iv+i)  ’ 
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associated  with  the  one-dimensional  LGL  quadrature. 

Let  us  represent  the  governing  equations  by  the  simplified  form 

^  +  V-F  =  S(q)  (7) 

where  F  represents  the  fiux  tensor  and  S  the  source  terms  which  we  define  explicitly  in 
the  appendix.  Taking  the  weak  form  of  Eq.  (7)  with  respect  to  global  basis  functions 
T  gives 

and  substituting  for  q  and  F  for  by  the  global  polynomial  approximation  similar  to 
Eq.  (5)  yields  the  global  Galerkin  projection  of  the  governing  equations 

L  “  ^iS{q)dx  =  0  (9) 

where  I,  J  =  1,  ...,Np  with  Np  representing  the  number  of  grid  points  in  the  horizontal; 
we  shall  return  to  the  discussion  on  the  construction  of  the  global  solution  in  Sec. 
3.1.2.  Because  the  global  operators  given  in  Eq.  (9)  are  never  explicitly  defined  we 
begin  by  defining  the  local  operators  which  are  in  fact  constructed  and  then  used  to 
construct  the  action  of  these  global  operators  on  the  state  vector. 

Before  discussing  the  construction  of  the  global  solution,  let  us  first  describe  the 
local  element-wise  operators  which  are  used  to  construct  the  global  solution.  Let 

M-j  =  'il)i{x)^j{x)dx  (10) 

represent  the  mass  matrix  and 

=  ^i{x)Vi)j{x)dx  (11) 

the  differentiation  matrix,  where  ^^{x)  are  the  local  element  basis  functions  given 
in  Eq.  (6),  ?,  j  =  1, ...,  {N  -|-  1)^  are  the  number  of  grid  points  within  each  element 
Qe;  and  D  =  (H*,  D^)  is  a  vector  of  matrices  corresponding  to  the  three  spatial 

directions.  The  role  of  these  local  element  matrices  are  described  below. 
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3.1.2  Satisfying  the  Equations  Globally 


To  satisfy  the  equations  globally  requires  assembling  the  global  solution  by  virtue 
of  an  element-wise  construction.  This  element-wise  construction  is  based  on  the 
summation  of  the  local  element  matrices  to  form  their  global  representation.  This 
summation  procedure  is  known  as  the  global  assembly  or  direct  stiffness  summation 
and  is  depicted  graphically  in  Fig.  1.  In  this  figure,  the  local  element  matrices  given 
in  Eqs.  (10)  and  (11)  are  constructed  inside  each  of  the  four  elements  (El, . . . ,  E4) 
and  then  each  element  contributes  its  local  approximation  to  the  global  sum.  For 
example,  the  element  matrix  contributions  at  the  local  grid  points  4,  3,  2,  and  1 
of  elements  El,  E2,  E3,  and  E4  are  summed  in  order  to  construct  the  value  of  the 
global  matrix  at  the  global  grid  point  Gl.  It  should  be  understood  that  the  local  grid 
points  (4, 3, 2,1)  are  the  same  grid  point  which  are  claimed  by  different  elements  (and 
possibly  processors)  which  in  the  global  indexing  is  Gl.  Let  us  represent  this  global 
assembly  procedure  by  the  summation  operator 


JVe 

A  (12) 

e=l 

with  the  mapping  {i,e)  — >  (/)  where  i  =  1, . . . ,  (iV -|- 1)^  are  the  local  element 
grid  points,  e  =  1, . . . ,  iVe  are  the  spectral  elements  covering  the  spherical  shell,  and 
/  =  1, . . . ,  iVp  are  the  global  grid  points.  Applying  the  global  assembly  operator  to 
the  local  element  matrices  results  in  the  following  global  matrices: 


Ne 

M  =  f\M^ 

e=l 

for  the  mass  matrix,  and 

Ne 

D=  f\D^ 

e=l 

for  the  differentiation  matrix. 


(13) 


(14) 
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With  these  operators  defined  and  by  denoting  the  global  grid  vector  for  the  surface 
pressure  as  ttg,  the  wind  velocity  as  ug-,  the  potential  temperature  as  Og-,  and  the 
geopotential  height  as  (j)G  we  can  now  write  the  semi-discrete  approximation  to  Eqs. 


(1),  (2),  and  (3)  as  follows 

M  -h  (ttgWg)  +  M  |^(^^)|  =  0 

M  Dug+M  X  UG)j-D(f)G-CpOG 


(15) 


D'Kg 

G 


(16) 

M^  +  u5C«,  +  M{a^}^  =  0  (17) 

where  the  superscript  T  denotes  the  transpose  operation,  and  the  terms  {  }g  de¬ 
note  the  global  grid  vector  of  the  quantities  inside  the  brackets  after  they  have  been 
vertically  discretized.  It  should  be  noted  that  the  mass  matrix,  M,  is  diagonal  and 
thereby  trivial  to  invert.  The  diagonal  property  of  this  matrix  is  due  to  the  dual  role 
of  the  LGL  points  which  are  used  both  as  grid  points  and  quadrature  points.  Further¬ 
more,  the  global  matrix  D  is  never  actually  constructed,  but  rather  only  its  action  on 
the  state  vector  q  is  computed  by  virtue  of  the  local  element  matrix  and  the  global 
assembly  procedure.  We  now  address  the  discretization  of  the  vertical  operators. 


3.2  Approximating  the  Solution  in  the  Vertical  Direction 

The  equations  are  discretized  in  the  vertical  direction  using  a  conservative  flux-form 
finite  difference  method.  This  is  the  same  vertical  differencing  method  used  in  the 
Navy’s  Operational  Global  Atmospheric  Prediction  System  (NOGAPS)  which  is  the 
U.S.  Navy’s  current  global  atmospheric  NWP  model  (Hogan  and  Rosmond  1991; 
Rosmond  2000).  NOGAPS  is  used  by  the  U.S.  Navy  for  medium-range  weather 
forecasts  worldwide.  This  model  is  used  to  drive  the  U.S.  Navy’s  mesoscale  model 
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(Hodur  1997)  and  is  used  as  a  coupled  ocean-atmosphere  system  (Rosmond  et  al. 
2002).  NOGAPS  uses  the  spectral  transform  method  in  the  horizontal,  a  flux-form 
finite  difference  in  the  vertical,  and  a  semi-implicit  leapfrog  scheme  in  time.  The 
horizontal  resolution  of  NOGAPS  recently  increased  from  T159  with  24  vertical  levels 
to  T239  with  30  vertical  levels  (Hogan  et  al.  2002). 

Although  we  could  also  discretize  the  vertical  operators  in  SEE-AM  with  the 
spectral  element  method  we  have  chosen  to  use  the  finite  difference  method  in  order 
to  remain  as  similar  as  possible  to  NOGAPS.  This  will  ensure  that  any  differences  in 
the  results  are  due  only  to  discrepancies  in  the  discrete  horizontal  operators  between 
the  two  models.  We  hope  to  report  on  a  spectral  element  vertical  discretization  in 
future  work. 

To  simplify  the  proceeding  discussion,  let  us  define  the  vertical  integration  of  the 
global  grid  point  solution  vector  Qq  —  {ttg,Ug,0g)  to  be 


K 

/  Qg^c^  ^ 

Jo  K  T  1 


where  K  denotes  the  number  of  vertical  levels  to  be  integrated  across  and 


represents  the  thickness  of  the  vertical  layer. 

To  discretize  the  equations  in  the  vertical  direction  we  begin  by  integrating  Eq. 
(15)  across  all  the  vertical  levels  of  the  atmosphere.  Applying  no-flux  boundaries  at 
the  top  and  bottom  levels  of  the  atmosphere  results  in 

— —  =  -M~^  ^  D'^{'KgUg)k^ok  (19) 

K=1 

where  Wev  denotes  the  total  number  of  vertical  levels.  Once  the  surface  pressure 
tendency  is  computed,  the  vertical  velocity  a  at  each  vertical  level  is  obtained  by 
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integrating  Eq.  (15)  from  the  top  of  the  atmosphere  (a  =  0)  to  the  desired  vertical 
level  K  thus  giving 


{'Kg^g)i 


-cr^+i  -  M  ^  ^  {'KgUg)k^OK 


where  K  denotes  a  full  level  and  K  ±  \  denote  the  half  levels  of  this  staggered  sigma 
coordinate  system.  The  prognostic  variables  and  (j)  all  reside  at  the  full  levels  while  the 
diagnostic  variables  reside  at  the  half  levels.  The  top  and  bottom  of  the  atmosphere 
are  at  the  half  levels  K  —  \  and  K  —  iViev  +  respectively.  Figure  2  illustrates  the 
sigma  coordinate  system  and  the  location  of  the  prognostic  and  diagnostic  variables. 

The  vertically  differenced  terms  of  the  global  solution  vector  Qq  are  computed 
in  the  following  manner 


■dqg 

da 


-^K+\ 


{^g)k+^  {Qg)k 


+  d^_i 


{Qg)k  -  {Qg)k- 
<^K  —  <7^-1 


where 


{'^g)k+\  -  2  (('“g)k+i  +  {ug)k) 


{0g)k+^  -  {^g)k 


^K+- 

p  '  P  +  («o)i 

-TfC+l  ~  -oc  / 


Pk+1  -  Pk+\ 
Pk+1  —  Pr 


These  interpolation  stencils  are  chosen  in  order  to  enforce  the  numerical  scheme  to 
conserve  energy.  Finally,  the  hydrostatic  equation,  Eq.  (4),  is  discretized  as  follows 

4>k  ~  4>k+i  —  Cp{Gg)k  {Pk+^  ~  Pk)  +  Cp{(^g)k+i  [Pr+i  —  Pr+^  (22) 


where  the  Exner  functions  used  are 


and  Pk 


1  P 


K  +  IJ  p'^  [Pk+i-Pr-I 
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3.3  Temporal  Discretization 

Discretizing  the  semi-discrete  system,  Eqs.  (19),  (20),  (16),  and  (17)  in  time  by  an 
explicit  Eulerian  leapfrog  method  yields 

_  qj-n  -^lev 

M  ^  ^  {'KgUgTk^ok  (24) 

K=1 


{t^gOgTk+I:  -  - 


TT, 


n+1 

G 


—  TT, 


K 


G 


2At 


(^K+h  ~  ^  {t^gUgTl^ol 


(25) 


L=1 


M  ^ 

2At 


^  -  [ug  Dug  +  M 


du  1 


f^UJZ  \  ( 

-h  M  (  — ^(ccg  X  ug)  I  +  D(1)g  +  Cp9G  —  I  Dttg 


G 


yd'K 


G 


(26) 


/irz+l  _  f\n  / 

M  ^  ^  =  -  I  w?  DBg  +  M  {a 


2At 


.  de] 


(27) 


Because  the  leapfrog  method  is  an  explicit  time  differencing  scheme  it  does  require 
a  stringent  time-step  restriction.  In  order  to  maintain  stability  throughout  long  time- 
integrations  (up  to  1200  days  for  the  Held-Suarez  test  case)  we  use  a  Courant  number 
of  1.  We  base  this  time-step  on  the  following  definition  of  Courant  number 


_  -^max 

As 


where 

As  =  Ax^  +  Ar/2  -h  Az^ 

is  the  physical  spacing  of  the  grid  and  A^ax  is  the  maximum  wave  speed  of  the  atmo¬ 
spheric  equations  (see  the  appendix  for  a  derivation  of  this  characteristic  velocity). 
The  physical  spacing  of  the  grid  As  scales  as 


As  oc 


1 
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where  is  the  number  of  spectral  elements  comprising  the  grid  and  N  is  the  order 
of  the  polynomial  approximation  inside  each  element. 

Because  this  temporal  discretization  method  produces  a  computational  mode  we 
apply  the  time-averaged  Asselin  hlter  (Asselin  1972) 

+  0.02  (gS+‘  -  2«S  + 

to  the  global  solution  vector  Qq  at  the  end  of  each  time-step.  While  this  is  not  the 
most  sophisticated  time  discretization  method  available,  we  have  used  it  in  order  to 
keep  our  model  as  similar  to  NOGAPS  as  possible.  NOGAPS  uses  a  semi-implicit 
time  discretization  but  this  is  applied  as  a  correction  to  the  explicit  leapfrog  scheme. 
Future  research  involves  the  addition  of  semi-implicit  Eulerian  and  fully-implicit  semi- 
Lagrangian  methods  to  the  current  formulation  which  will  hopefully  allow  an  increase 
in  the  time-step  by  a  factor  of  10. 

3.4  Lagrange  Multiplier  and  the  High-Pass  Filter 

3.4.1  Constraining  the  Momentum 

Because  we  are  using  Cartesian  rather  than  spherical  coordinates  we  must  carry 
three  momentum  equations  (in  addition  to  the  equation  of  vertical  motion);  however, 
because  flow  on  a  spherical  shell  is  really  only  two-dimensional  (at  each  a  level) 
the  fluid  particles  are  allowed  an  extra  degree  of  freedom.  This  degree  of  freedom 
will  manifest  itself  in  fluid  particles  flying  off  the  spherical  shell.  Mollifying  this 
undesirable  situation  requires  constraining  the  velocity  held  to  be  tangential  to  the 
sphere.  At  each  grid  point  we  apply  the  following  constraint 

+  i^x  (28) 
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where  the  subscripts  c  and  u  denote  the  constrained  and  unconstrained  horizontal 
wind  velocities  (see  Cote  1988  for  further  details).  For  a  fluid  particle  to  remain  on 
the  spherical  shell,  the  wind  velocity  must  be  orthogonal  to  the  position  vector  of  its 
grid  point;  that  is 

u  ■  X  —  Q, 

which  results  in  the  Lagrange  multiplier 


X  •  u 


n+l 


u 


Equation  (28)  can  now  be  written  as 


u 


n+l  _ 


Vu 


n+l 


where 


7  2  2 

a  —  X  —xy 


■P  =  — 
'  2 


-xy 
y  —xz 


a^-y^ 


-yz 


—xz 


-yz 


-  z"^ 


(29) 


is  the  projection  matrix  which  constrains  vector  quantities  to  be  tangential  to  the 
sphere.  It  should  be  pointed  out  that  using  this  Cartesian  formulation  introduces  no 
approximation  from  the  original  governing  equations  in  spherical  coordinates.  Swarz- 
trauber  et  al.  (1997)  have  shown  that  the  equations  in  Cartesian  coordinates  with 
this  type  of  projection  are  in  fact  identical  term-by-term  to  the  equations  in  spherical 
coordinates. 


3.4.2  High-Pass  Filter 

Like  any  high-order  method,  the  spectral  element  method  is  susceptible  to  aliasing 
errors.  In  order  to  prevent  these  high  frequency  waves  from  contaminating  the  solution 
through  the  introduction  of  non-physical  oscillations  a  high-pass  hlter  is  used.  We 
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use  the  filter  outlined  in  Boyd  (1998)  which  is  applied  as  follows.  In  one  dimension, 
we  expand  the  state  vector  q  in  the  ^  direction  as  follows 


Q  =  Lq{0  (30) 


where 


'  Poi^o) 

Pli^o)  P2{^o)  -  Poi^o)  ■ 

••  Pi{^o)-i 

^i-2(^o) 

•  •  Pn{^o)  —  Pn-2{^o) 

Lij  — 

Poi^j) 

Pli^j)  P2{^j)  -  Poi^j)  • 

••  p^{Q-l 

1  Po{^n) 

Pi{^n)  P2{^n)  -  Po{^n)  ■ 

••  Pi{^N)-l 

■  ■  Pn{^n)  —  Pn-2{^n) 

(31) 

is  the  the  Legendre  transform  matrix,  Pi  are  the  Nth  order  Legendre  polynomials, 
and  q  are  the  Legendre  modal  coefficients.  To  hlter  the  local  solution  q  we  transform 
them  to  modal  space  via  the  inverse  of  Eq.  (31),  apply  the  hlter  weighting  diagonal 
matrix  A,  and  then  transform  back  to  nodal  (grid  point)  space.  This  can  be  written 
as 

q]P  =  Fq  (32) 

where 

F  =  LAL-^  (33) 

is  the  hlter  operator  which  is  an  (A^  +  1)  x  (A^  +  1)  matrix.  The  success  of  the 
hlter  hinges  on  the  weighting  matrix  A.  Following  the  idea  of  P.F.  Fischer  (private 
communication)  we  write 

1  for  i  <  ip 

A,  =  <^  2  Vie[0,A^]  (34) 

where  i  =  ip,  ...,N  denotes  the  modes  curtailed  by  the  hlter.  In  this  paper  we  use 
ip  =  N  with  II  =  0.05  which  represents  only  hltering  the  highest  mode  by  5%. 
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The  reason  for  constructing  the  Legendre  transform  as  Pi  —  Pi-2  for  i  =  2, 
and  the  weighting  matrix  A  =  1  for  i  =  0, 1  is  to  avoid  affecting  the  local  element 
boundary  values.  By  using  this  Legendre  polynomial  construction,  only  the  first 
two  modes  (Pq  and  Pi)  affect  the  element  boundary  values;  the  remainder  of  the 
modes  only  affect  the  element  interior  values.  This  way  we  can  apply  the  filter  in 
an  element-by-element  sense  without  violating  the  (7°  continuity  condition  at  the 
element  interfaces  (boundaries).  This  eliminates  the  need  for  global  assembly  of  the 
local  element  filter  matrix.  The  global  assembly  operation  incurs  communication 
costs  on  distributed-memory  computers  which  must  be  minimized  to  achieve  good 
performance. 

In  2D,  the  filter  is  applied  as  follows 

q],  =  Fq^F'^  (35) 

where  F^  is  the  transpose  of  F  and  q®  is  an  (A^  -|-  1)  x  (A^  -|-  1)  matrix  containing  the 
solution  vector  of  the  element  Qg. 


4  Grid  Generation  on  the  Sphere 

One  of  the  advantages  of  using  Cartesian  coordinates  is  that  any  grid  can  be  used  with 
our  spectral  element  atmospheric  model.  Although  we  can  use  any  grid  whatsoever, 
at  the  moment  the  grids  must  be  conforming  and  quadrilaterally  shaped.  Using 
the  discontinuous  basis  functions  as  in  Giraldo  et  al.  (2002)  will  permit  using  non- 
conforming  grids  and  the  spectral  element  basis  in  Warburton  et  al.  (2002)  will 
allow  the  use  of  triangles  which  we  reserve  for  future  work.  In  order  to  show  the 
grid  independence  of  our  model,  in  this  paper  we  show  results  on  icosahedral  and 
hexahedral  grids. 
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4.1  Icosahedral  Grids 


Icosahedral  grids  are  constructed  by  subdividing  the  20  triangular  faces  of  the  icosa¬ 
hedron  by  a  Lagrange  polynomial  of  order  nj  as  described  in  Giraldo  et  al.  (2002). 
Prior  to  mapping  these  elements  onto  the  sphere  it  is  convenient  to  map  the  triangles 
onto  a  gnomonic  space.  The  most  unbiased  mapping  is  obtained  by  mapping  about 
the  centroid  of  the  triangles. 

Let  (Ac,  (fc)  be  the  centroid  of  the  triangle  we  wish  to  map.  The  gnomonic  mapping 
is  then  given  by 


a  cos  6  sin(A  —  Ac) 
sin  9f.  sin  6  -|-  cos  9c  cos  9  cos  (A  —  Ac) 
a  [cos  9c  sin  9  —  sin  9c  cos  9  cos(A  —  Ac)] 
sin  9c  sin  9  -|-  cos  9c  cos  9  cos(A  —  Ac) 


(36) 


To  simplify  matters  a  bit,  we  first  apply  the  rotation  mapping  R  whereby  Eq.  (36) 
becomes 


x  =  atanAR,  r/ =  a  tan  sec  Ar  , 


(37) 


in  the  new  coordinate  system  with  the  coordinates  (A,  <^)  located  at  (0,0).  The 
rotation  mapping,  R,  is  defined  as  follows 


Vr 


arctan 


cos  <^sin(A  —  Ac) 

sin  (fc  sin  (p  -|-  cos  Pc  cos  p  cos  (A  —  Ac) 


arcsin  [cos  Pc  sin  p  —  sin  pc  cos  p  cos(A  —  Ac)] . 


Once  the  triangular  icosahedral  grid  is  constructed,  we  subdivide  each  triangular 
element  into  3  quadrilateral  elements.  Upon  dividing  the  triangles  into  quadrilaterals 
one  can  construct  the  higher  order  LGL  grid  points  inside  each  element  resulting  in 
a  quadrilateral  grid  with  the  following  properties 

TVp  =  60(njiV)^  +  2 
TVe  =  60(nj)^ 
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(38) 

(39) 


where  Np  and  denote  the  total  number  of  grid  points  and  elements  comprising 
the  icosahedral  grid  and  N  is  the  polynomial  order  of  the  elements.  Examples  of 
corresponding  grids  ioi  nj  —  2  N  —  8  and  nj  —  A  N  —  8  are  illustrated  in  Fig.  3. 


4.2  Hexahedral  Grids 


Hexahedral  grids  are  constructed  by  subdividing  the  six  faces  of  a  hexahedron  into  the 
desired  number  of  quadrilateral  elements  and  then  mapping  these  onto  the  sphere. 
We  begin  by  constructing  a  spectral  element  grid  on  the  gnomonic  space  G.  G  is 
dehned  by  the  square  region  —  [— in  a  2D  Cartesian  space  (Komatitsch 
and  Tromp  2002;  Ronchi  et  al.  1996).  This  region  is  divided  into  the  elements  and 
inside  each  element  we  construct  the  LGL  grid  points.  Upon  constructing  this  grid 
we  then  map  the  gnomonic  coordinates  to  the  corresponding  spherical  coordinates, 
Xg,  via 


^G 


arcsin 


tan  rjG 


\Jl  +  tan^^G  +  tan^ryc, 

It  should  be  noted  that  Xq  only  gives  the  spherical  coordinates  of  one  of  the  six 
faces  of  the  hexahedron.  Therefore,  we  have  to  rotate  this  face  to  the  six  faces  of  the 
hexahedron  by  the  rotation  mapping  R 

^  cos  (fa  sin  ^g  ^ 


A  =  Ac  +  arctan 


cos  (pc  cos  Xg  cos  (fc  —  sin  pg  sin  Pc 
arcsin  (sin  pg  cos  pc  +  cos  pg  cos  Xg  sin  pc) 


(40) 


where  the  centroids,  (Ac,<^c),  of  the  six  faces  are  located  at  {X^Pc)  —  ([c  ~  1] 
for  c  =  1, . . . ,  4  and  (A5,  <^5)  =  ^0,  ,  (Ae,  Po)  —  (o,  — |)-  This  approach  results  in 

the  construction  of  the  hexahedral  grid  with  the  following  properties 


Np  —  G{nHN)‘^  +  2 


(41) 
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iVe  =  6(»hF 


(42) 


where  Ap  and  denote  the  number  of  grid  points  and  elements  comprising  the  grid. 
The  parameter  uh  refers  to  the  number  of  quadrilateral  elements  in  each  direction 
and  rjc)  contained  in  each  of  the  six  faces  of  the  hexahedron  and  N  is  the  polynomial 
order  of  the  elements.  Figure  4  shows  the  grids  njj  =  4  iV  =  8  and  uh  =  8  N  =  8. 
The  hexahedral  resolution  H  where 

H  =  uhN  (43) 

has  approximately  the  same  number  of  grid  points  as  the  spectral  triangular  trunca¬ 
tion  T  on  a  Gaussian  lat-lon  grid;  these  two  different  grid  resolutions  are  related  by 
the  expression 

H  r^T  +  l  (44) 

To  derive  this  relations  requires  a  few  dehnitions.  Let  the  number  of  grid  points  in  a 
spectral  model  be  given  by 

Np  =  ^lan  •  Niat  = 

where  and  Ni^t  denote  the  number  of  points  in  the  longitudinal  and  meridional 
directions  in  a  Gaussian  grid.  In  addition,  let  the  spectral  triangular  truncation  be 
given  by  T  =  —  1  which  results  in  the  number  of  grid  points  to  be 

=  \(T  +  !)"• 

Equating  this  expression  to  Eq.  (41)  yields  if  ~  ^(T  -|-  1)  which  we  approximate  by 
Eq.  (44).  Of  course,  there  are  other  ways  of  relating  the  resolution  of  different  grids 
but  certainly  using  the  number  of  grid  points  is  the  most  logical  approach.  With  this 
relationship  between  spectral  triangular  truncation,  T,  and  hexahedral  resolution, 
if,  we  can  obtain  the  equivalent  hexahedral  resolution  for  the  NOGAPS  operational 
resolution  of  T239  with  uh  =  30  and  N  =  8  (yielding  if 240). 


21 


5  Parallel  Implementation 


In  this  section  we  discuss  the  issues  concerning  the  implementation  of  SEE-AM  on 
distributed-memory  computing  platforms.  Let  us  begin  by  describing  the  domain 
decomposition  strategy. 

5.1  Domain  Decomposition 

Because  our  implementation  of  the  spectral  element  method  is  completely  indepen¬ 
dent  from  the  grid  we  are  free  to  choose  any  grid;  however  in  order  to  simplify  the 
discussion  of  our  model  we  describe  the  domain  decomposition  as  it  pertains  to  hexa- 
hedral  grids  only.  To  construct  a  hexahedral  grid  we  map  the  six  faces  of  a  hexahedron 
onto  a  sphere.  Therefore,  the  logical  partitioning  of  the  domain  is  the  decomposition 
of  the  spherical  domain  into  the  six  faces  of  the  hexahedron.  In  keeping  with  this 
simple  decomposition  strategy  we  then  further  subdivide  the  domain  into  perfectly 
square  regions.  In  other  words,  the  following  partitions  are  possible 

Nproc  =  Q{npf  (45) 

where  A^proc  is  the  total  number  of  processors  and  np  represents  the  partitioning  of 
processors  in  each  of  the  two  Cartesian  directions  on  each  of  the  six  faces  of  the 
hexahedron.  In  addition,  we  require  the  following  constraint  on  np 

np  <  np 

which  states  that  the  number  of  processors  cannot  exceed  the  number  of  spectral 
elements. 

This  is  by  no  means  the  only  possible  partitioning  strategy.  A  more  sophisticated 
approach  is  to  use  the  Metis  graph  partitioning  software  (Karypis  et  al.  1998)  or 
the  space- filling  curves  strategy  presented  in  Dennis  (2003).  We  merely  present  this 
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ad  hoc  hexahedral  grid  domain  decomposition  strategy  as  a  proof-of-concept  that 
SEE-AM  performs  efficiently  on  distributed-memory  computers.  In  tfie  future,  we 
plan  on  implementing  tfie  space-filling  curves  strategy  wfiicfi  will  simplify  tfie  domain 
partitioning  and  allow  us  to  use  any  grid  wfiatsoever. 

5.2  Communication 

From  tfie  description  of  tfie  global  assembly  procedure  given  in  Section  3.1.2  it  should 
be  apparent  that  the  communication  in  the  spectral  element  method  results  from  the 
summation  of  the  local  element  matrices  to  construct  the  global  matrices.  In  order  to 
better  understand  the  communication  that  takes  place  across  neighboring  processors 
let  us  look  at  only  one  face  of  the  hexahedral  grid.  For  the  sake  of  argument,  let  us 
assume  that  rip  =  3  meaning  that  there  are  nine  processors  per  face.  This  situation 
is  illustrated  in  Fig.  5.  Furthermore  let  us  assume  that  np  =  9,  that  is  there  are  81 
elements  per  face.  From  the  figure  it  is  evident  that  there  will  then  be  nine  elements 
inside  each  processor  denoted  by  A^l, . . . ,  N9  for  the  neighbors  and  PI, ... ,  P9  for 
the  on-processor  elements.  The  processors  are  denoted  by  the  thick  lines  while  the 
thin  lines  represent  the  spectral  elements.  However,  in  order  to  keep  the  discussion 
as  general  as  possible,  we  shall  not  define  the  order  of  the  polynomial,  N,  inside  each 
element. 

Because  of  the  (7°  continuity  condition  required  by  the  spectral  element  method, 
the  four  corner  points  of  processor  PROC  in  Fig.  5  are  each  shared  by  four  processors: 
PROC  plus  three  neighboring  processors.  Similarly,  edge  points  are  shared  by  two 
processors.  This  continuity  condition  is  satisfied  by  the  global  assembly  procedure 
which  is  executed  as  follows  across  processors.  In  processor  PROC  the  on-processor 
right-hand-side  (RHS)  vectors  of  Eqs.  (24),  (26),  and  (27)  are  constructed.  For  the 
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interior  grid  points  of  PROC  these  RHS  vectors  represent  the  globally  assembled  RHS 
vectors  of  the  equations;  however,  for  the  boundary  grid  points  they  are  only  a  portion 
of  the  global  RHS  vectors.  Thus,  processor  PROC  requires  the  boundary  grid  point 
RHS  vectors  from  processors  NBHl, . . . ,  NBH8  to  complete  the  global  assembly  of 
its  RHS  vectors.  Once  these  global  RHS  vectors  are  constructed  each  processor  can 
then  solve  independently  for  the  global  solution  via  Eqs.  (24),  (26),  and  (27).  Note 
that  Eq.  (25)  is  solved  on-processor  without  communication  after  Eq.  (24)  is  solved. 

As  an  example  of  how  the  global  solutions  of  the  processor  corner  points  are 
obtained  let  us  describe  the  procedure  for  the  bottom  left  grid  point  of  PROC  in 
Fig.  5.  To  construct  the  global  solution  at  this  grid  point  requires  knowing  the 
contribution  of  the  on-processor  assembled  RHS  vectors  from  the  neighbors  NBRl, 
NBR2,  and  NBR8.  Therefore,  each  processor  computes  its  on-processor  assembled 
RHS  vectors  locally,  and  then  sends  its  perimeter  values  to  its  eight  neighboring 
processors.  This  results  in  a  message  approximately  of  the  size 

20  (iV  +  l)iVieva  (46) 


where  the  ratio 

Uh 
a  =  — 

Up 


(47) 


represents  the  number  of  elements  per  processor  in  each  of  the  two  Cartesian  directions 
on  each  of  the  six  faces  of  the  hexahedron.  Using  Eq.  (42)  and  (45)  we  can  rewrite 
Eq.  (47)  as 


a  = 


(48) 


Equation  (46)  illustrates  that  the  message  size  scales  linearly  with  N,  A^iev,  and  a;  the 
constant  in  Eq.  (46)  arises  from  each  processor  having  four  edges  and  the  primitive 
equations,  in  Cartesian  coordinates,  having  hve  prognostic  variables.  Thus  at  every 
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time-step  the  perimeter  values  of  the  full  3D  RHS  vectors  of  each  processor  are  sent 
to  its  neighbors. 

The  communication  described  above  is  exact  for  all  processors  that  do  not  contain 
one  of  the  eight  corner  points  of  the  hexahedron.  In  Fig.  5  processor  NBH5  only 
has  seven  neighboring  processors  to  communicate  with  due  to  the  topology  of  the 
hexahedron.  In  addition,  for  the  special  case  uh  —  1  each  processor  has  only  four 
neighbors. 

5.3  Performance 

5.3.1  Model  Scalability 

One  of  the  main  advantages  of  using  spectral  element  methods  over  spectral  trans¬ 
form  methods  is  that  for  an  equivalent  resolution  the  spectral  element  method  allows 
the  use  of  far  more  processors.  As  an  example  let  us  compare  SEE-AM  with  NO¬ 
GAPS  which  uses  the  spectral  transform  method  in  the  horizontal.  The  most  efficient 
decomposition  for  NOGAPS  is  through  a  ID  decomposition  along  latitude  rings.  It 
should  be  noted,  however,  that  in  general  2D  decompositions  are  more  efficient  for 
spectral  transform  models  as  shown  in  Foster  et  al.  (1992).  We  shall  only  be  compar¬ 
ing  the  operational  version  of  NOGAPS  with  SEE-AM  and  it  should  be  understood 
that  the  discussion  in  this  section  does  not  necessarily  extend  to  all  spectral  transform 
models  but  it  should  provide  some  reasonable  estimates. 

Using  a  ID  domain  decomposition  the  maximum  number  of  processors  that  NO¬ 
GAPS  can  use  is 

^Woc  —  ^lat  ~  2^ 

where  Wat  denotes  the  number  of  latitude  rings  and  T  the  resolution  of  the  spectral 
triangular  truncation.  In  contrast,  the  maximum  number  of  processors  that  SEE-AM 
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can  use  is 


JVp"..  =  (60) 

where  we  have  used  Eqs.  (42)  and  (43)  in  order  to  simplify  the  expression  and  write  it 
as  a  function  of  hexahedral  resolution,  H.  In  other  words  SEE- AM  can  use  as  many 
processors  as  there  are  elements.  Thus  for  fixed  N  the  number  of  processors  allowed  by 
SEE-AM  increases  quadratically  with  resolution,  H,  while  only  linearly  for  NOGAPS. 
At  the  operational  T239  resolution  NOGAPS  can  use  360  processors  whereas  SEE- 
AM  (assuming  uh  —  30  and  N  —  8)  can  use  5400  processors;  a  fifteen-fold  increase 
in  the  number  of  processors.  Equation  (50)  shows  that  if  we  wish  to  further  increase 
the  number  of  processors  with  SEE-AM  we  simply  increase  uh  while  decreasing  N 
accordingly  in  order  to  maintain  the  horizontal  resolution  fixed.  Therefore  we  could 
use  Uh  —  60  and  A^  =  4  for  a  total  of  21600  processors;  a  sixty-fold  increase  in  the 
number  of  processors.  However,  decreasing  N  will  impact  the  solution  accuracy  and 
the  issue  of  efhciency  versus  accuracy  must  be  carefully  weighed.  The  point  here  is 
that  the  spectral  element  method  offers  this  flexibility  to  increase  either  the  accuracy 
or  efhciency  -  a  luxury  not  shared  by  the  spectral  transform  method.  We  leave  the 
detailed  discussion  of  the  issue  of  efhciency  versus  accuracy  to  future  work  but  at  this 
point  we  anticipate  using  N  —  8;  even  with  N  —  8  SEE-AM  accommodates  many 
more  processors  than  NOGAPS. 

5.3.2  Comparison  with  NOGAPS 

The  rate  of  boating  point  operations  for  SEE-AM  increases  as 

0  (nliV^Wev)  (51) 

which  corresponds  to  the  construction  of  derivatives  and  the  application  of  the  hlter 
in  the  spectral  element  method.  In  contrast  the  rate  of  boating  point  operations  for 
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NOGAPS  increases  as 


o  (<iV,e.)  (52) 

which  corresponds  to  the  computation  of  the  Legendre  transform  involving  the  merid¬ 
ional  direction.  The  Legendre  transform  has  been  the  bottleneck  of  the  spectral  trans¬ 
form  method  due  to  the  lack  of  a  fast  transform  such  as  the  fast  Fourier  transform 
(FFT)  of  Cooley  and  Tukey  (1965)  for  the  zonal  direction. 

In  order  to  compare  the  rates  of  operations  between  SEE- AM  and  NOGAPS  we 
rewrite  Eqs.  (51)  and  (52)  as  functions  of  resolution,  H  and  T,  as  follows  for  SEE-AM 

0  (iL'A^ievA")  (53) 


and  for  NOGAPS 

O  (tW,„)  . 


(54) 


Thus  the  cost  increases  cubically  with  resolution,  T,  for  NOGAPS  while  it  only 
increases  quadratically  for  SEE-AM,  H,  provided  that  N  remains  hxed. 

To  understand  why  the  spectral  transform  method  becomes  increasingly  more 
expensive  than  the  spectral  element  method  requires  revisiting  Eqs.  (51)  and  (52). 
For  the  spectral  element  method,  the  horizontal  resolution  is  governed  by  uh  and  N. 
Recall  that  uh  governs  the  number  of  spectral  elements  while  N  is  the  order  of  the 
polynomials.  This  last  term  is  analogous  to  A^iat  in  Eq.  (54)  for  the  spectral  transform 
method.  To  increase  the  horizontal  resolution  of  a  spectral  transform  model  requires 
increasing  A^iat  which  increases  the  cost  by  its  cube;  there  is  no  way  around  this.  In 
contrast,  with  the  spectral  element  method  one  has  the  choice  of  increasing  either  the 
number  of  elements  or  the  order  of  the  polynomial.  Since  the  cost  increases  cubically 
with  N  and  only  quadratically  with  uh  then  it  makes  sense  to  keep  the  polynomial 
order  hxed  and  increase  the  number  of  elements  to  obtain  higher  resolutions.  This 
hexibility  is  due  to  the  h-p  nature  of  the  spectral  element  method.  By  keeping  uh  hxed 


27 


(say  Uh  —  ^)  and  increasing  N  we  reach  the  spectral  transform  limit  of  the  spectral 
element  method  (known  as  the  p-type  method).  On  the  other  hand,  by  keeping  N 
fixed  (say  N  —  1)  and  increasing  Uh  we  obtain  the  linear  finite  element  limit  of  the 
spectral  element  method  (known  as  the  h-type  method).  On  serial  computers  the 
optimal  strategy  for  selecting  Uh  and  N  is  usually  to  pick  N  in  the  range  [8, 16]  and 
increase  Uh  to  yield  the  desired  resolution. 

The  rates  reported  in  Eqs.  (53)  and  (54)  are  based  on  a  per  time-step  basis; 
however,  in  practice  the  spectral  transform  method  admits  a  much  larger  time-step 
than  the  spectral  element  method.  This  is  due  to  the  time-step  stability  limits  being 
different  for  the  two  models.  For  NOGAPS  the  time-step  must  be  decreased  linearly 
with  horizontal  resolution  T, 

oc  — 

T’ 

while  for  SEE-AM  the  time-step  scales  as 


oc 


1  _  1  1 
^  NH' 


Therefore  if  we  use  the  p-version  of  the  spectral  element  method  then  the  time-step 
must  be  decreased  quadratically  with  resolution;  however,  if  we  keep  N  fixed  then  we 
can  achieve  a  linear  decrease  of  the  time-step  with  resolution  which  will  allow  spectral 
element  models  to  compete  with  spectral  transform  models.  Let  us  now  compare  how 
the  time-steps  differ  for  NOGAPS  and  SEE-AM. 

The  explicit  leapfrog  version  of  NOGAPS  admits  a  time-step  3.5  times  larger  than 
the  explicit  leapfrog  version  of  SEE-AM  and  the  semi-implicit  version  of  NOGAPS 
admits  a  time-step  15  times  larger  than  SEE-AM.  Thus  for  SEE-AM  to  be  competitive 
with  NOGAPS,  it  must  be  far  more  efficient  on  a  per  grid  point  and  per  time-step 
basis.  In  Taylor  et  al.  (1997),  they  show  that  the  cost  per  grid  point  of  the  spectral 
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transform  method  is 


4.25  iViat  +  107  log  iViat 

and  that  of  the  spectral  element  is  192  for  N  =  8.  These  cost  estimates  are  com¬ 
puted  on  a  per  processor  basis.  Based  on  these  cost  estimates  the  explicit  SEE-AM 
model  will  outperform  the  explicit  NOGAPS  beyond  resolutions  of  T108  and  the 
semi-implicit  NOGAPS  beyond  T406.  Currently,  some  spectral  transform  models 
are  running  beyond  T406  such  as  the  European  Centre  for  Medium-Range  Weather 
Forecasts’  model  which  uses  T511  and  NOGAPS  is  expected  to  be  at  or  beyond  this 
resolution  in  the  near  future.  A  semi-implicit  implementation  of  SEE-AM  will  out¬ 
perform  the  semi-implicit  NOGAPS  beyond  a  resolution  of  T185  which  is  well  below 
the  current  operational  resolution  of  T239.  This  resolution  is  obtained  by  comparing 
the  serial  versions  of  our  semi-implicit  implementation  of  SEE-AM  in  which  we  use 
a  conservative  estimate  of  a  factor  of  two  increase  in  performance  over  the  explicit 
SEE-AM;  however,  this  does  not  necessarily  guarantee  that  we  will  achieve  this  gain 
in  the  parallel  version  but  it  does  provide  a  good  estimate.  Nonetheless,  much  work 
has  been  done  regarding  this  issue  and  we  hope  to  benefit  from  the  volume  of  work 
in  the  literature  on  this  topic,  most  notably  the  parallel  elliptic  solvers  of  Tufo  and 
Fischer  (1999)  for  the  Navier-Stokes  equations  and  the  work  by  Loft  et  al.  (2001)  and 
Thomas  et  al.  (2002)  for  the  multi-level  shallow  water  equations. 

In  Fig.  6  we  show  a  performance  comparison  on  an  IBM  SP3  for  the  explicit 
NOGAPS  and  SEE-AM  models;  both  models  use  a  resolution  of  T159  with  Wev  =  24 
vertical  levels  and  the  explicit  SEE-AM  time-step.  At  =  35  seconds.  The  results 
of  this  figure  are  summarized  as  follows.  First,  SEE-AM  (spectral  element)  is  much 
faster  than  NOGAPS  (spectral  transform)  on  a  per  time-step  and  per  processor  basis. 
For  an  equal  number  of  processors,  say  150,  SEE-AM  is  more  than  two  times  faster 
than  NOGAPS.  Second,  SEE-AM  can  use  many  more  processors  than  NOGAPS.  At 
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a  resolution  of  T159,  NOGAPS  can  only  use  150  processors  effectively;  note  that 
beyond  150  processors  the  performance  of  NOGAPS  decreases.  In  contrast,  SEE-AM 
is  able  to  use  600  processors  effectively  and  it  can  accommodate  up  to  2400  processors. 
Finally,  the  results  presented  in  Fig.  6  are  very  reassuring  because  they  show  that 
SEE-AM  scales  linearly  for  increasing  processor  number.  At  600  processors,  both  the 
model  and  the  communication  network  did  not  suffer  any  severe  penalties.  While 
we  should  not  expect  to  get  the  same  type  of  performance  on  2400  processors,  it  is 
exciting  to  anticipate  that  it  may  be  possible. 

The  results  in  Fig.  6  clearly  show  that  SEE-AM  is  far  more  efhcient  than  NOGAPS 
on  a  per  time-step  basis  and  per  processor  basis.  However,  NOGAPS  can  use  a  time- 
step  much  larger  than  SEE-AM.  In  Fig.  7  we  plot  the  results  for  the  semi-implicit 
NOGAPS,  explicit  NOGAPS,  and  explicit  SEE-AM  for  a  T159  horizontal  resolution 
with  24  vertical  levels.  The  results  shown  here  are  plotted  using  the  maximum  time- 
step  that  each  model  allows  and  they  are  At  =  540,  120  and  35  seconds  for  the 
semi-implicit  NOGAPS,  explicit  NOGAPS,  and  SEE-AM,  respectively. 

The  results  of  this  study  are  summarized  as  follows.  For  small  processor  num¬ 
bers  NOGAPS  outperforms  SEE-AM.  However,  for  processor  numbers  greater  than 
250  SEE-AM  outperforms  the  explicit  NOGAPS.  If  the  linear  scalability  of  SEE-AM 
were  to  hold  for  increasing  processor  numbers  we  would  expect  SEE-AM  to  outper¬ 
form  the  semi-implicit  NOGAPS  at  around  900  processors.  However,  if  we  could 
double  the  time-step  by  introducing  a  semi-implicit  implementation  then  SEE-AM 
would  outperform  the  semi-implicit  NOGAPS  beyond  500  processors.  It  is  possible  to 
further  increase  the  efhciency  of  SEE-AM  because  this  model  has  not  yet  been  fully 
optimized.  The  results  presented  in  this  section  should  not  be  taken  as  the  optimal 
performance  of  spectral  element  models  but  merely  as  a  first  attempt  at  constructing 
fast  and  efhcient  NWP  models.  We  hope  to  beneht  from  the  work  of  Thomas  et  al. 
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(2002)  and  Loft  et  al.  (2001)  on  the  optimization  of  spectral  element  models. 


6  Results 


In  this  section  we  validate  the  spectral  element  Eulerian  atmospheric  model  (SEE- 
AM)  using  barotropic  and  baroclinic  test  cases.  The  barotropic  cases  are  used  to 
confirm  the  exponential  accuracy  of  the  discrete  spectral  element  horizontal  operators. 
The  baroclinic  cases  are  used  to  validate  the  full  3D  primitive  hydrostatic  equation 
model.  In  order  to  judge  the  accuracy  of  the  model  we  plot  normalized  L2  error  norms 
defined  as  follows 


Ikcllna 


/iito  exact  qcf  dx 

ffl  Qexact^^ 


(55) 


where  qc  is  the  computed  solution  vector,  Qexact  is  the  exact  solution,  and  the  norm 
is  computed  as  a  broken  norm.  All  the  results  are  computed  using  64  bit  arithmetic 
precision. 


6.1  Barotropic  Tests 

To  validate  the  spectral  element  discrete  operators  we  run  the  model  using  the  shallow 
water  tests  1,  2,  and  3  in  Williamson  et  al.  (1992).  Because  these  tests  admit  exact 
solutions  we  are  able  to  plot  normalized  geopotential,  (j),  L2  error  norms.  Eigure 
8  shows  that  SEE-AM  achieves  the  expected  exponential  convergence  regardless  of 
whether  the  icosahedral  or  hexahedral  grid  is  used.  Case  1  will  not  yield  exponential 
convergence  due  to  the  non-smooth  nature  of  the  derivatives  at  the  base  of  the  cosine 
bell. 

In  Eigs.  9,  10,  and  11  we  compare  SEE-AM  on  hexahedral  grids  with  various 
other  models.  These  models  are  the  Jakob-Chien  et  al.  1995  (spectral  transform), 
Heikes  and  Randall  1995  (finite  difference),  Taylor  et  al.  1997  (spectral  element). 
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and  the  Tomita  et  al.  2001  (finite  difference)  models.  These  models  were  chosen 
because  they  are  representative  of  the  current  methods  being  explored  for  future  NWP 
and  climate  models.  The  Jakob-Chien  model  uses  the  same  horizontal  operators 
used  in  the  NOGAPS  (Hogan  and  Rosmond  1991),  European  Centre  for  Medium- 
Range  Weather  Forecasts  (ECMWF,  Simmons  et  al.  1989),  the  National  Center  for 
Environmental  Prediction  (NCEP,  Sela  1980),  and  National  Center  for  Atmospheric 
Research  (NCAR,  Hack  et  al.  1992)  spectral  transform  models.  The  Heikes  and 
Randall  model  uses  the  same  horizontal  operators  in  the  Colorado  State  University 
geodesic  grid  climate  model  (Randall  2002;  Ringler  2000)  and  is  similar  to  the  German 
Weather  Service  model  (GME,  Majewski  et  al.  2002).  The  Tomita  et  al.  (2001)  model 
uses  the  horizontal  operators  expected  to  be  used  in  the  Japanese  Earth  Simulator 
project.  Finally,  the  Taylor  et  al.  (1997)  model  uses  similar  horizontal  operators  to  the 
NCAR  spectral  element  dynamical  core  (Loft  et  al.  2001;  Thomas  et  al.  2002)  and  the 
Rutgers  University  spectral  element  ocean  model  (Iskandarani  et  al.  2002).  Although 
it  is  very  difficult  to  compare  different  models  we  have  chosen  to  use  equivalent 
horizontal  resolutions  based  on  the  number  of  grid  points.  Thus  for  grid  point  models 
we  compute  the  equivalent  hexahedral  resolution  by  using  Eqs.  (41)  and  (43)  such 
that  for  a  given  number  of  grid  points  A^p  we  get 


The  results  for  the  horizontal  operators  are  summarized  as  follows.  Figures  9 
and  10  show  that  for  Cases  1  and  2  SEE- AM  yields  the  best  accuracy  of  all  the 
models  including  the  high-accuracy  Jakob-Chien  et  al.  and  Taylor  et  al.  models.  For 
Case  1,  SEE-AM  is  almost  an  order  of  magnitude  more  accurate  than  the  Taylor 
et  al.  model  and  is  twice  as  accurate  as  the  Jakob-Chien  et  al.  model.  For  Case 
2,  SEE-AM  is  approximately  eight  orders  of  magnitude  more  accurate  than  these 
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two  high-accuracy  models.  The  plateauing  of  the  error  for  SEE-AM  is  due  to  the 
accuracy  reaching  machine  precision.  Finally,  Fig.  11  shows  that  for  Case  3  SEE-AM 
gives  better  accuracy  than  all  the  other  models  except  for  the  spectral  transform 
model  of  Jakob-Chien  et  al.  which  yields  a  slightly  more  accurate  result  (0.7  x  10“® 
compared  to  1  x  10“®). 


6.2  Baroclinic  Tests 

Because  there  are  no  analytic  solutions  to  the  full  atmospheric  equations  we  cannot 
run  test  cases  as  in  the  barotropic  case  and  compare  to  exact  solutions.  Instead,  we 
need  to  either  use  test  cases  in  which  the  outcome  is  a  simple  enough  pattern  that 
might  be  easily  discerned  beforehand  or  we  need  to  run  benchmark  test  cases  run  by  a 
vast  community.  We  have  chosen  to  use  both  types  of  test  cases:  the  Rossby-Haurwitz 
wave  and  the  balanced  initial  state  representing  the  former,  and  the  Held-Suarez  test 
and  the  baroclinic  instability  test  the  latter. 

No  diffusion  operators  are  included  in  any  of  our  results  for  both  NOGAPS  and 
SEE-AM.  At  every  time-step,  the  |  triangular  truncation  is  applied  to  NOGAPS  and 
the  element-wise  filter  is  applied  to  SEE-AM. 


6.2.1  Rossby-Haurwitz  Wave  Number  4 


In  order  to  judge  the  accuracy  of  SEE-AM,  we  compare  it  to  NOGAPS  for  the  Rossby- 
Haurwitz  wave  number  4.  This  test  does  not  have  an  analytic  solution  and  so  we  use 
it  for  qualitative  comparisons.  From  Monaco  and  Williams  (1975)  we  initialize  the 
model  as  follows:  the  wind  velocity  is 


(A,  (p,a)  —  —  \A  sin  (nA)  cos”'*'^  (p  —  nA  sin  (nA)  cos”  ^  p  sin^  p  —  Ba^  cos  p 

(X 

v{X,  p,a)  —  -  ^An  sin  p  cos”“^  p  —  nA  sin  (nA)  cos  (nA) 
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Va  e  [0, 1]  where 


and 

B  1  /  A\^  2/1^ 

As  —  — (2Q  +  B)  cos^  (D  -\ —  (  — ;7 )  cos^”  (D  (n  +  1)  cos^  w  +  (2n^  —  n  —  2) - - —  , 

2  4  \a^J  [  cos"^  (p 

+  2n  +  2)  -  (n  +  1)^  cos^  cp]  , 

[n  + 1)  [n  +  2)  J 

(^)  <1^  “  (^  +  2)]  • 

Surface  contours  of  the  prognostic  variables  after  a  5  day  integration  for  T159, 
Niey  —  24  resolutions  of  SEE-AM  and  NOGAPS  are  shown  in  Figs.  12  and  13,  re¬ 
spectively.  The  results  between  the  two  models  are  virtually  indistinguishable.  This 
means  that  both  models  yield  similar  values  for  all  of  the  prognostic  variables  as  well 
as  similar  phase  speeds  -  an  important  property  for  the  successful  tracking  of  tropical 
cyclones.  It  should  not  be  surprising  that  SEE-AM  gives  identical  results  to  NO¬ 
GAPS.  Both  models  use  the  same  temporal  and  vertical  discretization  methods.  The 
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only  difference  is  in  the  horizontal  discretization  methods.  However,  in  Section  6.1 
using  barotropic  test  cases  we  showed  that  the  spectral  element  method  gives  almost 
identical  accuracy  to  the  spectral  transform  method. 

Having  established  the  accuracy  of  the  model  let  us  now  turn  to  the  stability  of 
the  spectral  element  model  for  longer  time  integrations. 

6.2.2  Held-Suarez  Test  Case 

This  test  case  was  introduced  by  Held  and  Suarez  (1994)  and  has  been  the  most  widely 
used  test  for  dynamical  cores.  In  essence,  this  test  case  provides  a  good  platform  to 
assess  the  capabilities  of  the  model  in  simulating  a  realistic  climate  circulation.  Simple 
boundary  conditions  are  used  in  order  to  parameterize  the  radiative  forcing  at  the 
surface.  The  momentum  and  potential  temperature  equations  are  slightly  altered 
in  order  to  introduce  an  equilibrium  temperature  due  to  the  sub-grid  scale  physical 
processes  and  a  Rayleigh  damping  of  the  low-level  winds  is  included  to  represent 
boundary-layer  friction.  The  momentum  equation  is  now  defined  as  follows 

du 

-  =  ...-  Ku 

and  the  potential  temperature  is 

Bf) 

_  =  ke  {9  -  0eq) 

where  the  ellipses  denote  the  usual  terms  in  the  momentum  and  potential  temperature 
equations  (see  Held  and  Suarez  1994  for  the  values  of  ky,  kg,  and  ^eq)-  For  this  test 
we  use  an  equivalent  resolution  to  that  used  in  Held  and  Suarez  (1994),  namely  H64 
{uh  =  8  and  N  =  8)  with  20  vertical  levels  (iViev).  The  mean  zonally-averaged  zonal 
velocity  and  temperature  are  shown  as  a  function  of  the  vertical  coordinate  a  in  Fig. 
14.  These  plots  are  obtained  from  a  1200  day  integration  with  the  results  sampled 
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every  4  days  beginning  with  day  200.  These  results  compare  quite  well  with  those 
obtained  with  the  spectral  transform  model  in  Held  and  Suarez  (1994)  where  the 
mid-latitude  jets  in  the  upper  atmosphere  are  clearly  visible  (Fig.  14a)  and  a  realistic 
temperature  stratification  is  maintained  (Fig.  14b). 

6.2.3  Jablonowski-Williamson  Test  Cases 

The  following  two  cases  represent  a  new  set  of  tests  for  judging  the  accuracy  and  sta¬ 
bility  of  dynamical  cores.  These  tests  are  introduced  in  Jablonowski  and  Williamson 
(2002). 

The  surface  pressure  is  initially  given  as  Ps(A,  (p)  —  1000  hPa  and  the  initial  wind 
velocities  are  defined  as 

u  (A,  (f,  a)  —  uo  cos^  ay  sin^(2<^) 


V  (A,  (p,a)  —  0 


where  uq  —  35  y,  ay  —  (a  —  <7o)|  ,  ao  —  0.252,  and  a  is  the  vertical  coordinate.  The 
horizontally  averaged  temperature  is 


To  a  s 


m: 


for  a  >  at 


To  a  3  -\-  AT  {pt  —  cr)^  for  a  <  at 
where  Tq  =  288  K,  F  =  0.005  9  —  9.806  p,  AT  =  4.8  x  10^  K,  and  at  —  0.2  is  the 

tropopause  level.  Defining  the  following  functions 
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we  can  now  write  the  potential  temperature  as  follows 

9{X,  p,  a)  =  P(A,  p,  a)  T(a)  -h  7  T(A,  p,  a)  sin  ay  cos2 

4  Bd 


If  3  ,  . 

2  ay  2uo  COS2  (a^)Tc  -|-  Be 
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where  P  is  the  Exner  function  and  (a,  co)  are  the  radius  of  the  earth  and  its  angular 
velocity,  respectively.  The  surface  geopotential  is 


v)  =  “0  COS2  {Gs  -  (To)^ 


Mo  COS2  {Gs  -  (To)  ]  Ac  +  Be 


where  =  1.  Using  these  test  cases  we  compare  SEE- AM  with  three  well-established 
models. 


Balanced  Initial  State  For  this  test  case,  the  atmosphere  is  initially  balanced  by 
the  above  equations  for  surface  pressure,  wind  velocities,  (w,  n),  potential  temper¬ 
ature,  0,  and  surface  geopotential,  Using  these  initial  conditions,  the  equations 
should  remain  balanced  for  an  indehnite  amount  of  time.  Figure  15  shows  the  nor¬ 
malized  surface  pressure,  tt,  L2  error  norm  as  a  function  of  time  for  a  30  day  period  for 
SEE- AM  with  HI 60  horizontal  resolution  and  24  vertical  levels.  Note  that  while  the 
error  oscillates  with  time  it  remains  bounded  which  conhrms  that  the  initial  balanced 
state  is  maintained. 

Baroclinic  Instability  This  case  is  similar  to  the  balanced  initial  state  except  that 
now  a  perturbation  is  added  to  the  initial  zonal  velocity.  This  perturbation  is  given 
by 


where 

r  —  a  arccos  [sin  cpc  sin  cp  +  cos  cpc  cos  cp  cos(A  —  Ac)] , 

(Ac,  (pc)  —  (f ,  and  i?  =  ^  are  the  location  of  the  perturbation  and  its  radius. 

This  perturbation  grows  until  a  baroclinic  instability  develops  and  then  breaks 
near  day  nine.  Figure  16  shows  the  minimum  surface  pressure  Ps  as  a  function  of 
time  for  SEE-AM  against  various  models  including  the  NCAR  spectral  transform 
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model  (Hack  et  al.  1992),  the  NASA  Goddard  finite  volume  model  (Yeh  et  al.  2002), 
and  the  German  Weather  Service  icosahedral  hnite  difference  model  (Majewksi  et  al. 
2002)  which  we  denote  as  GME;  the  results  of  the  latter  three  models  are  courtesy 
of  Christiane  Jablonowski.  Figure  17  shows  a  zoomed  in  version  of  Fig.  16.  For  the 
grid  point  models,  we  use  the  dehnition  of  equivalent  hexahedral  resolution,  if,  which 
we  dehned  for  the  barotropic  test  cases.  The  results  of  this  case  are  summarized  as 
follows.  Figure  16  shows  that  all  four  models  are  in  complete  agreement  until  day  8, 
at  which  point  the  two  lower  order  models  (NASA  and  GME)  diverge  from  the  NCAR 
and  SEE-AM  models.  The  two  lower  order  models,  NASA  and  GME,  compare  well 
with  each  other  throughout  the  14  day  integration.  There  are  some  slight  deviations 
between  days  12  and  13  (Fig.  17)  but  overall  they  both  follow  the  same  pattern.  The 
two  higher  order  models,  NCAR  and  SEE-AM,  compare  extremely  well  with  each 
other  throughout  the  14  day  integration.  This  can  be  seen  more  clearly  in  Fig.  17 
where  the  pressure  curves  are  directly  on  top  of  each  other. 

7  Conclusion 

A  new  dynamical  core  constructed  using  the  spectral  element  method  based  on  3D 
Cartesian  coordinates  has  been  presented.  The  advantages  of  using  Cartesian  co¬ 
ordinates  are  the  elimination  of  the  polar  singularity,  the  flexibility  to  use  any  grid 
including  adaptive  unstructured  grids,  and  the  ease  with  which  the  Eulerian  model 
can  be  converted  to  a  semi-Lagrangian  form.  The  advantage  of  using  the  spectral  ele¬ 
ment  method  is  that  it  achieves  the  same  order  of  accuracy  as  the  spectral  transform 
method  while  taking  better  advantage  of  distributed-memory  computers. 

In  this  paper  we  show  results  for  icosahedral  and  hexahedral  grids  and  in  future 
work  we  expect  to  report  on  the  use  of  adaptive  unstructured  grids.  The  exponential 
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accuracy  of  the  spectral  element  method  was  illustrated  using  analytic  solutions  to 
barotropic  test  cases  which  confirmed  that  the  spectral  element  method  yields  at 
least  the  same  level  of  accuracy  obtained  with  the  spectral  transform  method.  Using 
baroclinic  test  cases,  we  demonstrated  that  our  spectral  element  atmospheric  model 
gives  similar  results  to  spectral  transform  models;  including  the  U.S.  Navy’s  NWP 
model  and  the  NCAR  climate  model.  Finally,  the  performance  of  the  spectral  element 
model  was  shown  to  scale  linearly  with  increasing  processors  -  a  trait  not  shared  by 
spectral  transform  models.  Through  our  comparison  of  NOGAPS  and  SEE-AM  we 
showed  why  the  spectral  element  model  will  outperform  spectral  transform  models 
for  the  types  of  horizontal  resolutions  required  by  future  NWP  applications. 

The  results  confirm  that  SEE-AM  offers  an  attractive  alternative  strategy  for 
constructing  future  NWP  and  climate  models  on  parallel  computers.  In  order  to 
make  SEE-AM  competitive  with  operational  NWP  spectral  transform  models  we  are 
extending  the  explicit  Eulerian  model  to  semi-implicit  Eulerian  and  fully-implicit 
semi-Lagrangian  and  we  hope  to  report  our  findings  in  the  future. 
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Appendix 

The  atmospheric  equations  can  be  written  in  the  following  conservation  form 

|  +  V.F  =  S(<,) 
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i,j,k  denote  the  Cartesian  directional  vectors,  and  the  fluxes  are 
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where  U  =  n  ■  u  and  the  maximum  wave  speed  of  the  atmospheric  equations  is 
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9  Figure  Captions 

Figure  1  The  contributions  of  the  local  element  matrices  are  summed  across  all 
elements  in  order  to  construct  the  corresponding  global  matrices;  this  is  the 
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global  assembly  procedure.  The  local  element  matrix  values  at  the  local  grid 
points  (4,  3,  2,  1)  of  elements  (El,  E2,  E3,  E4),  respectively,  are  summed  to 
obtain  the  value  of  the  global  matrix  at  the  global  grid  point  Gl.  These  local 
grid  points  are  in  fact  the  exact  same  point  which  in  the  global  indexing  is 
referred  to  as  Gl. 

Figure  2  The  equally-spaced  sigma  coordinate  system  used  in  the  vertical  discretiza¬ 
tion.  The  prognostic  variables  reside  at  the  full  levels  (solid  lines)  and  the 
diagnostic  variables  are  at  the  half  levels  (dashed  lines).  No-flux  boundary 
conditions  are  used  at  the  top  and  bottom  of  the  atmosphere. 

Figure  3  The  icosahedral  grid  for  a)  nj  =  2  iV  =  8  and  b)  nj  =  4  iV  =  8. 

Figure  4  The  hexahedral  grid  for  a)  njj  =  4  iV  =  8  and  h)  uh  —  8  N  —  8. 

Figure  5  The  communication  stencil  required  by  the  elements  (PI, . . . ,  P9)  in  pro¬ 
cessor  PROG.  iVl, . . . ,  N9  represent  the  elements  of  the  8  neighboring  proces¬ 
sors  (NBR).  The  dashed  box  represents  the  perimeter  values  that  each  processor 
sends  to  its  neighbors. 

Figure  6  The  simulation  days  per  wallclock  time  as  a  function  of  processors,  Aproc, 
for  the  explicit  NOGAPS  and  SEE- AM  for  T159  and  N\ev  —  24.  Both  models 
use  the  maximum  allowable  time-step  for  SEE-AM. 

Figure  7  The  simulation  days  per  wallclock  time  as  a  function  of  processors,  A^proc, 
for  the  semi-implicit  NOGAPS,  explicit  NOGAPS,  and  SEE-AM  for  T159  and 
A^iev  =  24.  Each  model  uses  its  maximum  allowable  time-step. 

Figure  8  Barotropic  Cases  1,  2,  and  3:  The  geopotential,  cj),  normalized  L2  error  as 
a  function  of  polynomial  order,  N,  for  the  Williamson  et  al.  shallow  water  tests 
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1,2,  and  3  using  a)  icosahedral  and  b)  hexahedral  grids.  The  rij  =  1  icosahedral 
grid  and  =  1  hexahedral  grid  are  used  and  the  results  are  reported  for  12 
days,  5  days,  and  5  days  for  Cases  1,  2,  and  3,  respectively. 

Figure  9  Barotropic  Case  1:  The  geopotential, normalized  L2  error  as  a  function 
of  horizontal  resolution,  ff,  for  the  Williamson  et  al.  shallow  water  case  1  after 
12  days  for  SEE- AM  (thick  line),  the  Jakob-Chien  et  al.  model  (spectral  trans¬ 
form),  the  Heikes  and  Randall  model  (hnite  difference),  and  the  Taylor  et  al. 
model  (spectral  element).  The  SEE- AM  model  uses  the  =  1  (A^e  =  6)  grid. 

Figure  10  Barotropic  Case  2:  The  geopotential,  (j)^  normalized  L2  error  as  a  func¬ 
tion  of  horizontal  resolution,  H,  for  the  Williamson  et  al.  shallow  water  case  2 
after  5  days  for  SEE-AM  (thick  line),  the  Jakob-Chien  et  al.  model  (spectral 
transform),  the  Heikes  and  Randall  model  (hnite  difference),  the  Taylor  et  al. 
model  (spectral  element),  and  the  Tomita  et  al.  model  (hnite-difference).  The 
SEE-AM  model  uses  the  =  1  =  6)  grid. 

Figure  11  Barotropic  Case  3:  The  geopotential,  (j)^  normalized  L2  error  as  a  func¬ 
tion  of  horizontal  resolution,  iJ,  for  the  Williamson  et  al.  shallow  water  case  3 
after  5  days  for  SEE-AM  (thick  line),  the  Jakob-Chien  et  al.  model  (spectral 
transform),  the  Heikes  and  Randall  model  (hnite  difference),  the  Taylor  et  al. 
model  (spectral  element),  and  the  Tomita  et  al.  model  (hnite-difference).  The 
SEE-AM  model  uses  the  =  1  (W  =  6)  grid. 

Figure  12  Rossby-Haurwitz  Wave  Number  4:  The  surface  a)  pressure  (hPa),  b) 
temperature  (K),  c)  zonal  velocity  (m/s),  and  d)  meridional  velocity  (m/s)  for 
SEE-AM  with  HIQO  {uh  =  20,  A^  =  8)  and  Wev  =  24  for  a  5  day  integration. 

Figure  13  Rossby-Haurwitz  Wave  Number  4:  The  surface  a)  pressure  (hPa),  b) 
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temperature  (K),  c)  zonal  velocity  (m/s),  and  d)  meridional  velocity  (m/s)  for 
NOGAPS  with  T159  and  A^iev  =  24  for  a  5  day  integration. 

Figure  14  Held-Suarez  Test:  Plots  of  the  a)  mean  zonally-averaged  zonal  velocity 
(m/s)  and  b)  mean  zonally-averaged  temperature  (K)  for  SEE- AM  after  1200 
days  for  H64  [uh  —  8^  N  —  8)  and  A^iev  =  20. 

Figure  15  Jablonowski- Williamson  Balanced  Initial  State:  The  normalized  surface 
pressure,  vr,  L2  error  as  a  function  of  days  for  SEE-AM  H160  (tih  —  20  and 
N  —  8)  with  24  vertical  levels. 

Figure  16  Jablonowski- Williamson  Baroclinic  Instability:  The  minimum  surface 
pressure  (hPa)  as  a  function  of  days  for  the  NASA  (finite  volume),  GME  (finite- 
difference),  NCAR  (spectral  transform),  and  SEE-AM  (spectral  element  with 
riH  —  20  and  N  —  8)  models  using  26  vertical  levels.  (The  data  for  the  first 
three  models  are  courtesy  of  Christiane  Jablonowski.) 

Figure  17  Jablonowski- Williamson  Baroclinic  Instability:  A  zoomed-in  view  of  Fig. 
16. 

10  Figures 
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Figure  1:  The  contributions  of  the  local  element  matrices  are  summed  across  all 
elements  in  order  to  construct  the  corresponding  global  matrices;  this  is  the  global 
assembly  procedure.  The  local  element  matrix  values  at  the  local  grid  points  (4,  3, 
2,  1)  of  elements  (El,  E2,  E3,  E4),  respectively,  are  summed  to  obtain  the  value  of 
the  global  matrix  at  the  global  grid  point  Gl.  These  local  grid  points  are  in  fact  the 
exact  same  point  which  in  the  global  indexing  is  referred  to  as  Gl. 
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Figure  2:  The  equally-spaced  sigma  coordinate  system  used  in  the  vertical  discretiza¬ 
tion.  The  prognostic  variables  reside  at  the  full  levels  (solid  lines)  and  the  diagnostic 
variables  are  at  the  half  levels  (dashed  lines).  No-flux  boundary  conditions  are  used 
at  the  top  and  bottom  of  the  atmosphere. 
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Figure  5:  The  communication  stencil  required  by  the  elements  (PI, . . . ,  P9)  in  pro¬ 
cessor  PROC.  iVl,...,iV9  represent  the  elements  of  the  8  neighboring  processors 
(NBR).  The  dashed  box  represents  the  perimeter  values  that  each  processor  sends  to 
its  neighbors. 
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Figure  6:  The  simulation  days  per  wallclock  time  as  a  function  of  processors,  Aproc, 
for  the  explicit  NOGAPS  and  SEE-AM  for  T159  and  A^iev  =  24.  Both  models  use  the 
maximum  allowable  time-step  for  SEE-AM. 


55 


Figure  7:  The  simulation  days  per  wallclock  time  as  a  function  of  processors,  Aproc,  for 
the  semi-implicit  NOGAPS,  explicit  NOGAPS,  and  SEE-AM  for  T159  and  A^iev  =  24. 
Each  model  uses  its  maximum  allowable  time-step. 
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Figure  8:  Barotropic  Cases  1,  2,  and  3:  The  geopotential,  (f),  normalized  L2  error  as 
a  function  of  polynomial  order,  N,  for  the  Williamson  et  al.  shallow  water  tests  1,  2, 
and  3  using  a)  icosahedral  and  b)  hexahedral  grids.  The  rij  =  1  icosahedral  grid  and 
nn  =  1  hexahedral  grid  are  used  and  the  results  are  reported  for  12  days,  5  days,  and 
5  days  for  Cases  1,  2,  and  3,  respectively. 
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Figure  9:  Barotropic  Case  1:  The  geopotential, normalized  L2  error  as  a  function 
of  horizontal  resolution,  H,  for  the  Williamson  et  al.  shallow  water  case  1  after  12 
days  for  SEE- AM  (thick  line),  the  Jakob-Chien  et  al.  model  (spectral  transform),  the 
Heikes  and  Randall  model  (finite  difference),  and  the  Taylor  et  al.  model  (spectral 
element).  The  SEE-AM  model  uses  the  nn  =  I  {N^  =  6)  grid. 
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Figure  10:  Barotropic  Case  2:  The  geopotential,  <p,  normalized  L2  error  as  a  function 
of  horizontal  resolution,  H,  for  the  Williamson  et  al.  shallow  water  case  2  after  5 
days  for  SEE-AM  (thick  line),  the  Jakob-Chien  et  al.  model  (spectral  transform), 
the  Heikes  and  Randall  model  (finite  difference),  the  Taylor  et  al.  model  (spectral 
element),  and  the  Tomita  et  al.  model  (finite-difference).  The  SEE-AM  model  uses 
the  nn  =  I  {N^  =  6)  grid. 
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Figure  11:  Barotropic  Case  3:  The  geopotential,  <p,  normalized  L2  error  as  a  function 
of  horizontal  resolution,  H,  for  the  Williamson  et  al.  shallow  water  case  3  after  5 
days  for  SEE-AM  (thick  line),  the  Jakob-Chien  et  al.  model  (spectral  transform), 
the  Heikes  and  Randall  model  (finite  difference),  the  Taylor  et  al.  model  (spectral 
element),  and  the  Tomita  et  al.  model  (finite-difference).  The  SEE-AM  model  uses 
the  nn  =  I  {N^  =  6)  grid. 
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Figure  12:  Rossby-Haurwitz  Wave  Number  4:  The  surface  a)  pressure  (hPa),  b) 
temperature  (K),  c)  zonal  velocity  (m/s),  and  d)  meridional  velocity  (m/s)  for  SEE- 
AM  with  if  160  {riH  =  20,N  =  8)  and  Wev  =  24  for  a  5  day  integration. 
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Figure  13:  Rossby-Haurwitz  Wave  Number  4:  The  surface  a)  pressure  (hPa),  b)  tem¬ 
perature  (K),  c)  zonal  velocity  (m/s),  and  d)  meridional  velocity  (m/s)  for  NOGAPS 
with  T159  and  iViev  =  24  for  a  5  day  integration. 
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Figure  14:  Held-Suarez  Test:  Plots  of  the  a)  mean  zonally-averaged  zonal  velocity 
(m/s)  and  b)  mean  zonally-averaged  temperature  (K)  for  SEE- AM  after  1200  days 
for  i?64  [uh  —  8,  N  —  8)  and  iViev  =  20. 
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Figure  15:  Jablonowski- Williamson  Balanced  Initial  State:  The  normalized  surface 
pressure,  tt,  I/2  error  as  a  function  of  days  for  SEE-AM  H160  {uh  =  20  and  N  =  8) 
with  24  vertical  levels. 
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Figure  16:  Jablonowski- Williamson  Baroclinic  Instability:  The  minimum  surface 
pressure  (hPa)  as  a  function  of  days  for  the  NASA  (finite  volume),  GME  (finite- 
difference),  NCAR  (spectral  transform),  and  SEE-AM  (spectral  element  with  uh  =  20 
and  N  =  8)  models  using  26  vertical  levels.  (The  data  for  the  first  three  models  are 
courtesy  of  Christiane  Jablonowski.) 
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